Neural Network Classifier

Neural networks can learn


In [5]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd

from sklearn.cross_validation import train_test_split
from sklearn.linear_model import LogisticRegressionCV
from sklearn import datasets


from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.utils import np_utils

Load Iris Data


In [8]:
iris = datasets.load_iris()
iris_df = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                     columns= iris['feature_names'] + ['target'])

In [9]:
iris_df.head()


Out[9]:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target
0 5.1 3.5 1.4 0.2 0.0
1 4.9 3.0 1.4 0.2 0.0
2 4.7 3.2 1.3 0.2 0.0
3 4.6 3.1 1.5 0.2 0.0
4 5.0 3.6 1.4 0.2 0.0

Targets 0, 1, 2 correspond to three species: setosa, versicolor, and virginica.


In [17]:
sns.pairplot(iris_df, hue="target")


Out[17]:
<seaborn.axisgrid.PairGrid at 0x12059fef0>

In [22]:
X = iris_df.values[:, :4]
Y = iris_df.values[: , 4]

Split into Training and Testing


In [23]:
train_X, test_X, train_Y, test_Y = train_test_split(X, Y, train_size=0.5, random_state=0)

Let's test out a Logistic Regression Classifier


In [24]:
lr = LogisticRegressionCV()
lr.fit(train_X, train_Y)


Out[24]:
LogisticRegressionCV(Cs=10, class_weight=None, cv=None, dual=False,
           fit_intercept=True, intercept_scaling=1.0, max_iter=100,
           multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
           refit=True, scoring=None, solver='lbfgs', tol=0.0001, verbose=0)

In [26]:
print("Accuracy = {:.2f}".format(lr.score(test_X, test_Y)))


Accuracy = 0.83

Let's Train a Neural Network Classifier


In [27]:
# Let's Encode the Output in a vector (one hot encoding)
    # since this is what the network outputs
def one_hot_encode_object_array(arr):
    '''One hot encode a numpy array of objects (e.g. strings)'''
    uniques, ids = np.unique(arr, return_inverse=True)
    return np_utils.to_categorical(ids, len(uniques))

train_y_ohe = one_hot_encode_object_array(train_Y)
test_y_ohe = one_hot_encode_object_array(test_Y)

Defining the Network

  • we have four features and three classes
  • input layer must have 4 units
  • output must have 3
  • we'll add a single hidden layer (choose 16 units)

In [28]:
model = Sequential()
model.add(Dense(16, input_shape=(4,)))
model.add(Activation("sigmoid"))

In [29]:
# define output layer
model.add(Dense(3))
# softmax is used here, because there are three classes (sigmoid only works for two classes)
model.add(Activation("softmax"))

In [30]:
# define loss function and optimization
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])

What's happening here?

  • optimizier: examples include stochastic gradient descent (going down steepest point)
    • ADAM (the one selected above) stands for Adaptive Moment Estimation
    • similar to stochastic gradient descent, but looks as exponentially decaying average and has a different update rule
  • loss: classficiation error or mean square error are fine options
    • Categorical Cross Entropy is a better option for computing the gradient supposedly

In [32]:
model.fit(train_X, train_y_ohe, epochs=100, batch_size=1, verbose=0)


Out[32]:
<keras.callbacks.History at 0x12243cdd8>

In [33]:
loss, accuracy = model.evaluate(test_X, test_y_ohe, verbose=0)
print("Accuracy = {:.2f}".format(accuracy))


Accuracy = 0.97

Nice! Much better performance than logistic regression!

How about training with stochastic gradient descent?


In [34]:
stochastic_net = Sequential()
stochastic_net.add(Dense(16, input_shape=(4,)))
stochastic_net.add(Activation("sigmoid"))

stochastic_net.add(Dense(3))

stochastic_net.add(Activation("softmax"))
stochastic_net.compile(optimizer="sgd", loss="categorical_crossentropy", metrics=["accuracy"])

In [35]:
stochastic_net.fit(train_X, train_y_ohe, epochs=100, batch_size=1, verbose=0)


Out[35]:
<keras.callbacks.History at 0x12255d9b0>

In [36]:
loss, accuracy = stochastic_net.evaluate(test_X, test_y_ohe, verbose=0)
print("Accuracy = {:.2f}".format(accuracy))


Accuracy = 0.93

based on Mike William's introductory tutorial on safaribooksonline


In [ ]: